Imports

Running the following script to read train names and phenotypes from snippy output file "core.vcf"

$ grep '^CP001217' core.vcf > new_core.txt

Counting synonymous variant from the dataset.

Removing synonymous variant from the dataset

Retrieving information from metadata for phenotype propertie and starin order

Correct order of strain name

Transforming data for training

Hypterparameter tuning

1. Gini Vs Entropy

2. n_estimators

3. Depth of the tree

4. Maximum features

5. Minimum samples leaf

Initial Model and Variable selection

Distribution of Variable Importance Measure values for all features

Understanding Model Accuracy with respect to VIM or number of features

Plot 1

Plot2

Selection of variables above VIM 0.0002

Model running after soft cutt off

Correlation Among Selected Features

Correlation Among Selected Features and Feature Clustering

For class imbalance

VIM
Boruta

Data Wrangling for final list of features with meta data

SHAP

For shap clustering

Class Imbalance

BacGWASim Modeling

projects/hpylori/bacgwasim/recomb_0.2/results_BacGWASim/simulations/phenSim/0/phenSim.phen:0 projects/hpylori/bacgwasim/recomb_0.2/results_BacGWASim/simulations/phenSim/1/phenSim.phen:0 projects/hpylori/bacgwasim/recomb_0.1/results_BacGWASim/simulations/phenSim/0/phenSim.phen:0

Importing pickel file for modeling

associate phenotype col with each . avg values / mean / variance only look at non sysm=--- can be interesttinhg --- > cannot do it's simulated

Modeling

Boruta to BACGWASIM

VIM for bacGWASim

Doesn't match with the two VIM list.

    Features    Importance

218714 1:147303:G:T 0.0